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Abstract 

o 

We compare the Hybrid Monte Carlo (HMC) and the Kramers equation algo- 
rithms for simulations of QCD with two flavors of dynamical Wilson fermions and 

^ gauge group SU (2). The results for the performance of both algorithms are obtained 

on 6 3 12, 12 4 and 16 4 lattices at a pion to p meson mass ratio of m^/nip 0.9. We 

^ hnd that the Kramers equation algorithm gives an equally good performance as the 

HMC algorithm. We demonstrate that the classical equations of motion used in 
these algorithms lack reversibility in practical simulations and behave like those of 
a chaotic dynamical system with a Liapunov exponent v ~ 0.75. 



1 Introduction 

Numerical simulations have been proven to be an important tool in obtaining information 
about non-perturbative properties of a physical system. Finding efficient algorithms is therefore 
one of the major research subjects. In particular, improved algorithms for models containing 
fermions, notably QCD, are needed. Up to now, the Hybrid Monte Carlo (HMC) algorithm [1] 
has been the most often used update scheme for fermionic systems. It is an exact Monte Carlo 
method, easily implementable on vector and parallel machines and it has been proven to work 
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well in practice. Still, simulations of fermionic systems turn out to be very difficult and time 
consuming. Improvements on this situation are therefore clearly welcome and alternatives to 
the HMC algorithm should be tried out. 

Recently, a new idea has been put forward by M. Liischer [2] that establishes another exact 
method for QCD simulations. It consists of a "bosonization" of the fermionic system. One 
arrives at a completely local bosonic action. The price to pay is the introduction of a number 
of scalar field copies, where the number of copies typically is (9(100) on a I6 4 lattice. Unfortu- 
nately, unexpected long autocorrelation times have been encountered in practical simulations 
within a QCD like theory [3, 4], where the gauge group was chosen to be SU(2) instead of 
SU(3). So far, it remains to be seen whether the problem of these large autocorrelation times 
can be overcome. 

In this paper, we want to continue the search for fermion algorithms by studying the Kramers 
equation algorithm. This algorithm was proposed by Horowitz [5] already some time ago and is 
a generalization of the HMC algorithm. The main modification is that, in the refreshment of the 
momenta, a term proportional to the momenta themselves is added. It can be made exact by 
introducing a global accept/reject step. At least for a free field theory, it can be shown to have 
a dynamical critical exponent of z = I. However, in contrast to the HMC algorithm, this result 
can be obtained without going to the large trajectory length limit. The hope is therefore that, 
as far as the critical slowing down is concerned, it works equally well as the HMC algorithm. 
However, due to the shorter trajectory lengths, it might consume less computer time. The 
Kramers equation algorithm was introduced and discussed in [5]. A particular implementation 
and further discussion can be found in [6], where a test of the algorithm was performed on 
the 2-dimensional Gross-Neveu model. The results in [6] indicate that the Kramers equation 
performs as well as the HMC algorithm. A test of the Kramers equation algorithm for QCD 
has not been performed so far. 

Here we want to fill this gap and study the Kramers equation algorithm in QCD with 
SU{2) gauge group. The reason why we have chosen SU(2) instead of SU(3) is that we want to 
continue the algorithm tests as initiated in [3, 4] with Liischer's fermion algorithm. Our results 
for Wilson QCD show that, in the actual computer time, the Kramers equation algorithm 
performs at least as well as the conventional Hybrid Monte Carlo algorithm. 

Another aspect which is in favor of the Kramers equation algorithm, is the lack of reversibil- 
ity of the discretized classical equations of motion in the HMC algorithm. Since for lattices 
used in todays simulations, the number of steps within a trajectory can reach (9(100) [7], one 
may fear that due to accumulations of rounding errors, the HMC algorithm lacks its reversibil- 
ity property, necessary for the detailed balance condition. Indeed, in numerical experiments 
violations of reversibility have been observed [8]. In the Kramers equation algorithm, this effect 
is drastically reduced since only one step in the leapfrog integration is used. 

This paper is organized as follows. After defining our model in the next paragraph, we 
recapitulate shortly the HMC algorithm in section 2. In section 3, we introduce the Kramers 
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equation algorithm. Two improvements we have used, namely preconditioning and a better 
leapfrog integration scheme as proposed in [9] are explained in section 4. The performance 
tests of the Kramers equation against the HMC algorithm are presented in section 5. Section 6 
is devoted to the investigation of reversibility in the HMC algorithm and we will conclude in 
section 7. 

The theory that we would like to study is the standard lattice Wilson QCD with gauge group 
SU{2). We will work on a 4-dimensional euclidean space-time lattice with volume = L 3 s L t . 
We introduce gauge fields U^x) £ SU (2) where = 0,1,2,3 designates the 4 forward directions 
in space-time and quark fields tpAaa(x) where A, a and a are flavor, color and Dirac indices, 
respectively. The full partition function for our model is given by, 

Z = J VUV^Vi^exp (S g - S w ) , (1) 

where the gauge action S g and the Wilson fermion action S w are given by: 

S B = -f£Tr(tf P ), 

L p 

S w = J2H x )( D + m M x )- (2) 

X 

The Wilson difference operator D which appears in the above expression is given by (setting 
the Wilson parameter to one): 

d = jE7,(v, + v;)-v,v;, 

V^(s) = U,,(x)^(x + fi) - ijj(x) , 

V;VM = *j>(x) - Ufa - n)4>(x - n) , (3) 

and Up is the usual plaquette term on the lattice. In the following, we will consider two 
degenerate flavors of Wilson fermions. As usual, for the simulations the fermion determinant 
is written in terms of Gaussian scalar fields <f> such that the path integral reads 

z = j vuv^v<p e - s -ff , 

Sell = S g + ^(AftM)-V , (4) 

with M = 2k(D + m) the fermion matrix. The Wilson hopping parameter n is related to the 
bare quark mass via n = (8 + 2m)~ 1 . 



2 Hybrid Monte Carlo Algorithm 

Let q represent some stochastic variable, i.e. the gauge link fields and the scalar fields of 
eq. (4) in our case. Let us furthermore introduce a fictitious Monte Carlo time t and a set 
of momenta p which are conjugate to q. In the HMC algorithm, the system defined by the 
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euclidean action S(q) evolves then according to (stochastic) Hamilton's equations of motion 
which read, considering time to be continuous for the moment, 



SH 
SH 



^ Sq ' 



Here the Hamiltonian H = -^p + S(q) and the initial momenta are obtained from random 
numbers with Gaussian measure of unit variance. A leapfrog integrator [1] with N m d molecular 
dynamics steps and a discrete step size e is used to integrate eqs. (5) numerically. The product 
tN m <i is called the trajectory length and a trajectory consists of N m d molecular dynamics steps. 
In the simulations eN m <i = 0(1). 

In the HMC algorithm, due to the finite step size errors of the leapfrog integration, a global 
accept/reject step at the end of a trajectory is needed to ensure exactness of the algorithm. In 
a free theory, one can show that the acceptance rate of the Hybrid Monte Carlo algorithm is 
given by [10, 11]: 

P acc ~ erfc(ciV m(i e 3 VO) , (6) 

where erfc is the error function and c a constant. If the total trajectory length eN m <i is fixed, 
an increasing e will drive the acceptance rate to zero exponentially. Therefore, one should take 
as large a step size as possible, while keeping the acceptance rate at a reasonably high level. 
To maintain a constant acceptance rate, one also has to scale the step size according to fi -1 / 4 . 
In our simulations, we have tried to maintain the acceptance rate at 80 to 90 percent level. 



3 Kramers equation algorithm 

The Kramers equation algorithm for simulating quantum field theories was introduced and 
described in [5]. It amounts to add a second order time derivative term to the usual Langevin 
equation and originates from the theory of Brownian motion (see [12] for a discussion of the 
Kramers equation in this context). The equations describing the time evolution of the system 
are very similar to eqs. (5) and read 

SH 

P = -— --fp + T](t), 

q = ^' (7) 

where the stochastic variables rj(t) are the so-called "white noise". Of course, eqs. (7) are to 
be understood only on a formal level. We see that the main difference between eqs. (5) and 
eqs. (7) is the white noise term and an addition of a friction term which introduces a new 
tunable parameter 7. 
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In this paper, we adopt a particular variant of the Kramers equation algorithm [5, 6] which 
in its discretized version consists of the following steps. 

1. ) Generate momenta pi according to a Gaussian distribution with zero mean and unit variance 

P( Pl ) oc . (8) 

Given the initial configuration (<7i,pi), 

2. ) update the momenta, using a discretized time step e 

p = e~ ie p 1 + Vl - e~ 2 ^r] , (9) 

where rj is again selected from a Gaussian distribution with zero mean and unit variance. 

3. ) Leapfrog integration 



p = P+ ~F{qi) , 
92 = qi + ep , 

P2 = p+^F(q 2 ) } (10) 

where F = —SS/Sq is the force. 

4.) Perform a Metropolis test, by accepting the candidate configuration with probability 

P(q u p -> q 2 , p 2 ) = min{l, e H (^v)-H( q2 , V2 )^ _ ( U ) 

On rejection, set 

?2 = ?1 , P2 = ~P, (12) 

where the negation of the momenta is necessary to guarantee exactness of the algorithm. In the 
appendix, we show that the above scheme fulfills the stability criterion, which -in combination 
with ergodicity- ensures the convergence to the ground state probability distribution [13]. In 
the simulations, steps 2.)-4.) are repeated &-times before the momenta are refreshed again in 
step 1.). In our simulations, a value of k = 4 was chosen. Obviously, in the limit 7 = 00 and 
k = 1, the above scheme reduces to a one step Hybrid Monte Carlo algorithm. 

A continuum free field analysis of eqs. (7) [5, 6] reveals that the exponential autocorrelation 
time behaves as 1 /cu m8n , where w m ,- n is the slowest mode of the system, i.e. the inverse correlation 
length £. This behavior, corresponding to a dynamical critical exponent of z = 1 as in the HMC 
algorithm, is assumed at 7 = 2u m i n . 

The remarkable property of the Kramers equation algorithm is that this result can be 
obtained already for short trajectory lengths, whereas for the HMC algorithm, a value of z = 1 
can only be reached in the large trajectory length limit [11, 14]. This property of the Kramers 
equation algorithm has important consequences. First, one obviously saves computer time per 
trajectory. Second, since we have only one step in the trajectory, effects of the (non)-reversibility 
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in the HMC algorithm (see Section 6) are drastically reduced. Third, according to the free field 
analysis of the tuning of the step size (6), the step size is expected to scale like e oc fi -1 / 6 . This 
softer volume dependence of the step size is indeed seen in our practical simulations. 

The drawback, of course, is that the trajectory lengths become shorter, resulting in larger 
autocorrelation times. Whether the advantages mentioned above will merit the increase of the 
autocorrelation times, will be investigated in section 5, where we give results for the performance 
of the HMC and the Kramers equation algorithms in simulations of QCD with gauge group 
SU{2). In the next section, we will first explain two improvements that we have implemented 
for both the HMC and the Kramers equation algorithms. 



4 Improvements 



Even-odd Preconditioning 

Preconditioning [15] is by now standard for simulations in QCD. We write the fermion 
matrix as 

M = ( 1 " KDe ° ) . (13) 
\ -kD 0£ 1 ) 

The nondiagonal part of the fermion matrix D eo only connects the odd lattice points with even 
lattice points and similarly the matrix D oe only connects the even lattice points to the odd 
lattice points. The preconditioned matrix M is now 

M={ 1 ° ) • (14) 

^ 1 - k'D„D„ ) 

and the path integral (4) can be written equally in terms of the preconditioned matrix M. 

The preconditioned matrix has two advantages over the original fermion matrix. The first 
is a reduction of the memory requirement. Since M only connects odd with odd (or even with 
even) sites, we save a factor of two in the memory requirement. The second advantage is that 
the matrix M^M, which is used in the simulations, is better conditioned than the original 
matrix M ] M . 

Let us denote X max and A to be the largest and the lowest eigenvalue of the matrix M^M 
and X max , Ao will correspond to the largest and the lowest eigenvalue of the matrix M^M . One 
can show that 

\[KZ, = \\M\\ < 1 + k||Z>|| < 1 + 8k , (15) 



and similarly y X max = \\M\\ < 1 + 64k 2 , where the notation |.| stands for the / 2 -norm of 
the matrix [16]. The ratio (1 + 64/€ 2 )/(l + 8k) is close to one for all practical values of n in 
numerical simulations. Therefore, when comparing the condition number X max l X of the matrix 
M^M with X max / Xo of the preconditioned matrix ikf^M, the main difference comes from the 
ratio of the lowest eigenvalue in the two cases, assuming that the largest eigenvalue is close 
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Table 1: Comparison of the average lowest eigenvalue for the original matrix M^M 
(< A >) and the preconditioned matrix M^M (< A >). Ncg is the average 
number of Conjugate Gradient iterations to reach a given residue and < Up > is 
the average plaquette value. The subscript old stands for the original and pre for 
the preconditioned matrix. 



K 


< A > 


< Up t0 i d > 


Ncg, old 


< Ao > 


< Up >pre > 


Nc<G,pre 


0.150 


0.3650(103) 


0.449(2) 


57 


1.267(22) 


0.4469(3) 


35 


0.160 


0.1960(20) 


0.461(2) 


76 


0.7520(10) 


0.4607(4) 


46 


0.170 


0.1210(50) 


0.490(2) 


95 


0.3893(87) 


0.4954(4) 


54 



to the bound given above. It is expected that the lowest eigenvalue A of the preconditioned 
matrix M^M is about a factor of 4 larger than the lowest eigenvalue of the original matrix 
M^M. This is motivated by noticing that for any eigenvalue A of the matrix M, (2 A — A 2 ) 
is an eigenvalue of the matrix M. Therefore, when |A| is small, the absolute value of the 
corresponding eigenvalue of M is about a factor of two larger than that of M. However, since 
the eigenvalues of M are not directly related to the eigenvalues of M^M, we have to test this 
expectation numerically. Indeed, the effect is confirmed by our numerical simulation results. 
Since the number of Conjugate Gradient iterations is proportional to the square root of the 
condition number, we expect to save about a factor of two in computer time. 

We have tested the even-odd preconditioned version of the HMC algorithm on a small 
lattice (4 4 ) for various n values at f3 = 1.75 and compared with the version using the original 
matrix M^M. The results are summarized in table 1. The lowest eigenvalue A (A ) of the 
matrix M^M (M^M) was measured by minimizing the Ritz functional, fJ,(ip) = | \Mif>\ | 2 /| \if>\ | 2 , 
using a Conjugate Gradient technique [17, 18]. One sees that indeed the average value of the 
lowest eigenvalue < A > of M^M is always about a factor of 3.5 larger than < A > of the 
original matrix M^M. Correspondingly, the number of Conjugate gradient iterations Ncg for 
each inversion of the matrix is decreased. The average plaquette values < Up > in both cases 
are also listed for comparison. A similar improvement was also found in the larger volume 
simulations. For the lattice size of 6 3 12 and f3 = 2.12, n = 0.15, we found the average value 
< A > to be larger by about a factor of 3.6 as compared to < A >. 

Sexton- Weingarten Integration 

Another improvement can be made to the leapfrog integration scheme used in the conven- 
tional Hybrid Monte Carlo algorithm. This has been suggested by Sexton and Weingarten 
in [9]. The basic idea is the following. The force F needed for the update of the momenta 
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consists of two parts. One originates from the gauge staples and the other from the fermionic 
force. While the pure gauge part needs only moderate computer time, the fermionic force part 
includes a matrix inversion and consumes considerably more CPU time. 

One may, however, introduce different time scales for the leapfrog integration corresponding 
to the two force terms. For the pure gauge part, the leapfrog can be done with finer time steps. 
Although we then have to do more arithmetic operations for this part, we finally gain in the 
increase of the acceptance rate due to partial cancellations of the finite step size errors coming 
from the force. 

Consider the Hamiltonian 

H= l -p 2 + S 1 (q) + S 2 (q) 1 (16) 

for which the number of arithmetic operations required to evaluate the force due to Si is far 
less than that due to S 2 . Then, the partial time evolution operators are given by 

Ti(e) = exp(ie J L(5'i))exp(e J L(ip 2 ))exp(ie J L(5'i)) , 

T 2 (e) = exp(eL(S 2 )). (17) 

In eq. (17) L is a linear operator, representing a symplectic integrator [9, 6]. Sexton and 
Weingarten suggest that one can define a full time evolution operator T(e) by: 



n 



r 2 (|). (is) 



As has been shown in [9], errors induced by finite time step sizes will be reduced by the above 
method due to finer step sizes, leading to a better integration scheme. A more complicated 
scheme can be constructed by adding one more T 2 insertion in each step. The scheme we used 
is given by the following formal expression: 



T(e) = T 2 (|) 
= ^Wn 



-1 n — l 

U^- )T 3 (f )T fc (f )T 3 (f ) T fc (f)T 3 (f )T fc (f)T 3 (-f- ) . (19) 
4n on 4n on J 4n on 4n Yin 



In the above formula, a factor of T 2 (e) stands for an update of the momenta by a step e, taking 
into account the force coming from the fermionic part only. Similarly, T g (e) stands for an 
update of the momenta by a step e due to the gauge part alone and Tfc(e) stands for an update 
of the gauge fields. 

In the practical simulation, the integer n is chosen to be around 4. Increasing this number 
will not result in a further improvement of the acceptance rate [9]. The total trajectory length 
in such a sequence of updates is t = eN m <i and the number of matrix inversions is equal to 
(2N m <i + 1). More complicated schemes than the above will require more T 2 insertions in each 
step and will not lead to further improvements. 

The tests of the above scheme as given in eq. (19) on the small lattice (4 4 ) runs turned out 
to be quite promising. In table 2, we have listed the results of the tests for the conventional 
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Table 2: Comparison of the Sexton- Weingarten integration scheme 
with the conventional leapfrog scheme. The lattice size is 4 4 with 
f3 = 1.75 and n = 0.15. The total trajectory length has been fixed 
to one. 



Conventional Leap-frog Integration Scheme 


e 


N m d 


Niter 


p 

± acc 


iter 1 Pace 


0.05 


20 


691(2) 


0.97(1) 


712(07) 


0.0625 


16 


576(5) 


0.95(1) 


608(10) 


0.0833 


12 


454(5) 


0.91(1) 


499(10) 


0.10 


10 


389(5) 


0.88(2) 


442(15) 


0.111 


9 


362(2) 


0.83(2) 


437(10) 


0.125 


8 


329(4) 


0.78(4) 


421(20) 


0.167 


6 


265(2) 


0.62(2) 


427(20) 


0.20 


5 


235(4) 


0.52(3) 


451(25) 


Sexton- Weingarten Integration Scheme 


0.10 


10 


735(10) 


1.00(1) 


735(10) 


0.125 


8 


606(10) 


1.00(1) 


606(10) 


0.167 


6 


472(08) 


1.00(1) 


472(08) 


0.20 


5 


405(05) 


0.99(2) 


410(08) 


0.25 


4 


331(03) 


0.97(3) 


341(10) 


0.333 


3 


262(03) 


0.95(3) 


275(10) 


0.50 


2 


195(02) 


0.66(2) 


295(12) 



integration scheme and the Sexton- Weingarten scheme for the HMC algorithm. These simula- 
tions were done on a 4 4 lattice with f3 = 1.75 and n = 0.15. We fixed the trajectory length to 
be one in all these runs, which is a reasonable value for the practical runs as well. Then, we 
systematically changed the number of steps of the trajectory hence the step size. The num- 
ber of conjugate gradient iterations needed for each trajectory (Ni ter ) and the acceptance rate 
(Pace) of the runs are listed in table 2. The ratio Ni ter / P acc should be minimized for optimal 
performance. From table 2 we see that, for the conventional leapfrog integration scheme, the 
best performance is reached at about 80 percent acceptance rate. The corresponding value of 
Niter /Pace ls about 420. The Sexton- Weingarten integration scheme, however, can achieve a 
Niter/ Pace ratio of about 270, which is about 50 percent better than the conventional scheme. 
Using this new integration scheme, we can increase the step size quite substantially. The effect 
of these improvements on larger lattices is expected to be more pronounced [9]. 
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Table 3: Average plaquette value < U p >, average lowest eigenvalue < A > of 
M^M, the pion and p meson masses at k = 0.15, f3 = 2.12 for 6 3 12 and 12 4 lattices. 
The statistics is given in terms of measured trajectories. For the Kramers equation 
algorithm only every sixth (fourth) trajectory was measured on the 6 3 12 (12 4 ) lattice. 
For the HMC algorithm, every trajectory was measured. 



Algorithm 


L 3 s L t 


Trajectories 


<u p > 


< Ao > 




m p 


Kramers 


6 3 12 


1060 


0.5803(2) 


0.0251(2) 


1.225(8) 


1.317(9) 


HMC 


6 3 12 


2080 


0.5800(2) 


0.0253(2) 


1.213(7) 


1.299(9) 


Kramers 


12 4 


1695 


0.5779(3) 


0.01472(13) 


1.030(7) 


1.098(10) 


HMC 


12 4 


1060 


0.5777(3) 


0.01492(10) 


1.047(10) 


1.113(10) 



5 Performance tests 

In the implementation of the HMC and Kramers equation algorithms, we adopted the $ algo- 
rithm [19]. The simulations were performed on the Quadrics (APE) machines at DESY, Ql 
with 8 nodes and QH2 with 256 nodes. Since these machines have only 32 bit precision, we 
used a Kahan summation technique [20, 21] to achieve accurate results for scalar products and 
other global summations. We also tried the biconjugate gradient method. However, similar to 
the conclusions in [6], we did not find an improvement on the APE computer. 

As mentioned in the introduction, we want to continue the search for an alternative to 
and hopefully better algorithm than the HMC algorithm. We therefore performed our tests 
at the same parameter values as chosen in [3, 4], i.e. f3 = 2.12, n = 0.15 and lattice sizes of 
6 3 12, 12 4 and 16 4 . For the lattice size 16 4 , we only ran the Kramers equation algorithm. We 
have measured the average plaquette value < U p >, the average lowest eigenvalue < A > of 
the preconditioned matrix M^M, the pion correlation functions and the p meson correlation 
functions in our runs. When we give values of the pion mass m v and the p meson mass m p , we 
use as a definition 

^-'-Ww 1 ' (20) 

where C is the correlation function for the pion or p meson. At the chosen values of f3 and /c, 
we obtain a ratio of m^/nip 0.9. 

In order to get an estimate for the performance of the Kramers equation and the HMC 
algorithms, we have studied the integrated autocorrelation times of the plaquette U p , the lowest 
eigenvalue A of the preconditioned matrix M^M , the pion C\ and the p meson C p correlation 
functions at distance L t /2 + 1. We list the values of < U p >, < A >, m v as well as m p in 
table 3 for both algorithms, demonstrating that they give consistent results. 

In an ideal situation, the integrated autocorrelation time Ti nt of some observable O is ob- 
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tained via 




E M*)/r o (0), 



(21) 



where To(t) is the connected autocorrelation function of the observable O. In reality, the 
summation can, of course, only be taken over a finite data set. Summing over the whole 
data set, assuming that the statistics is much larger than the autocorrelation time, may also 
be misleading, since the noise in the tail of the autocorrelation functions gives an unwanted 
bias. Therefore, we have used the "window" method suggested by Sokal [13] to extract the 
integrated autocorrelation times. This method basically suggests that one should only sum 
the autocorrelation function up to a certain distance T cut} thereby introducing the integrated 
autocorrelation function 



In practice, the integrated autocorrelation time of some observable (9, f 8nt ((9), is determined 
by searching for a plateau behavior of the corresponding integrated autocorrelation function in 
the range of T cut /fi nt (0) ~ 4 — 10. The cut parameter T cut should also satisfy T tota i/T cut ^> 1, 
where T tota i is the total number of measurements. In Fig. 1, we plot the integrated autocorre- 
lation function for A to give an example. One indeed observes a plateau behavior where the 
integrated autocorrelation function does not depend on the value of T cut . 

In table 4, we give the results for the integrated (f 8nt ) and exponential [r exv ) autocorrela- 
tion times for the plaquette value U p and the pion correlation function CV at distance L t /2 + 1 
measured per trajectory. For the runs on the 6 3 12 lattice, we ran eight copies of the program on 
single nodes of the APE Ql machine. This allowed us to determine the integrated autocorre- 
lation times for each copy independently, from which we obtained the error quoted in table 4. 
For the 12 4 and 16 4 lattices, the error was obtained by splitting the total run into smaller parts 
and analyze these parts independently. 

We checked the values of the integrated autocorrelation times as given in table 4 by a 
blocking analysis of the error of the observables. Our statistics is sufficiently large that we can 
reach an error plateau. The integrated autocorrelation times determined in this way are in 
complete agreement with the ones obtained from the integrated autocorrelation functions. In 
addition, we determined the exponential autocorrelation times T exp by fitting the time behavior 
of the autocorrelation functions to an exponential. Again, T exp fi nt (see table 4) in all cases. 

In the last column of table 4, we give the product of the integrated autocorrelation time 
Tint{U p ) for the plaquette value and the average number of Conjugate Gradient iterations per 
trajectory. This quantity gives a direct estimate for the computer time consumption of both 
algorithms and hence should be used for comparison. 

Two comments are in order concerning table 4. First, comparing the results for the au- 
tocorrelation times on the 6 3 12 and the 12 4 lattices, one notices fi nt (U p ) to be smaller on the 
larger lattice. This demonstrates that, by a more careful tuning of the parameters, substantial 




t=i 




11 




Figure 1: We plot the integrated autocorrelation function for the lowest eigenvalue 
A of M^M as obtained from the Hybrid Monte Carlo algorithm on a 6 3 12 lattice 
at /3 = 2.12, k = 0.15. 



improvements can be achieved. As we started our investigation with the 6 3 12 lattice, we did 
not choose the optimal values for the parameters. 

Second, on the 12 4 lattice, we ran the Kramers equation algorithm at two values of 7: (a) 
7 = 0.5 and (b) 7 = 2.0. The behavior of the integrated autocorrelation function is given in 
Figure 2. It shows that tuning of this free parameter 7 in the Kramers equation algorithm 
can change the autocorrelation times by almost a factor of two. It is interesting to note that, 
whereas the autocorrelation times of the fermion correlators are somewhat increased, the ones 
for Up and A are decreased. We found a similar effect of the tuning of 7 in runs on 8 3 12 
lattices. Again, while increasing 7, the autocorrelation times for the fermionic correlators 
increased whereas the ones for U p and A dropped. 
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Table 4: Results for the autocorrelation times. The entry in the last column 
Tint{U p )NcG gives the product of the plaquette integrated autocorrelation time and 
the average number of Conjugate Gradient iterations per trajectory. For the 12 4 
lattice we ran the Kramers equation algorithm with two values of 7: (a) 7 = 0.5 



and (b) 7 = 2.0. 



Algorithm 


L 3 s L t 


T int (Up) 


Tint ( C v ) 
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6 Reversibility 

The detailed balance proof of the HMC algorithm requires exact reversibility [1] of the dis- 
cretized equations of motion. Of course, on a computer, the algorithm runs with only a finite 
precision and one may wonder, whether accumulations of rounding errors can lead to violations 
of the exact reversibility. This is particularly true for todays simulations where (9(100) mole- 
cular dynamics steps are used to obtain one trajectory [7]. One may investigate this question 
by taking an initial configuration C™, integrating it along a trajectory and then integrating it 
back to reach the end configuration C en d- Indeed, measuring several observables on d n i and 
C en d discrepancies were encountered [8]. 

We decided to study this problem in numerical simulations using the HMC algorithm. The 
nonlinear nature of Hamilton's equations of motion that are used in the HMC algorithm suggests 
that the dynamics is chaotic. To test this proposal, let us define a quantity, ||e?£7||, to measure 
the difference between two gauge field configurations 



Here U^(x) } V^(x) are two SU(2) gauge link variables with lattice point index x, direction // 
and group index a. 

To show that the suggested chaotic behavior is really a property of Hamilton's equations of 
motion, we proceeded in the following way. Given a gauge field configuration obtained in the 
course of some run, we added a small noise SU^x) to the gauge field variable U^x), such that 
V fl (x) = U l j i (x)-\-SU l j i (x). Then, we took both configurations and iterated them according to the 
leapfrog integration scheme used in the HMC algorithm. We measured 1 1 c?f7 1 1 after some number 




(23) 



13 




Figure 2: Integrated autocorrelation function on the 12 4 lattice for two values of 7, 
(a) 7 = 0.5, (b) 7 = 2.0. The integrated autocorrelation function is plotted in units 
of measurements, which we performed every fourth trajectory for this run. 



of steps N m <i in the leapfrog integration. If the system is chaotic, we expect that asymptotically 

(eN md > 1) 

\\dU\\= Ae vtN ™ d . (24) 

In eq. (24) v is the -to be determined- Liapunov exponent, characterizing a chaotic system and 
e is the step size used in the program. 

In Fig. 3a, we show /(/(||g??7||) as a function of eN m( i } where Ig stands for the 10-based 
logarithm. Clearly, an asymptotic linear behavior is seen, giving a Liapunov exponent of 
v = 0.78. The step size e was chosen to be e = 0.01 and the lattice is = 4 4 . The data are 
obtained by averaging over 10 independent gauge held configurations. Errors are smaller than 
the symbol size. 

Obviously, the noise that has been added is faking some rounding errors that occur in the 
actual simulations. If such rounding errors appear, we predict then that the errors blow up 
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Figure 3: We plot /(/(||g?[/||) as defined in eq. (23). In (a) ||e?£7|| is obtained by 
adding a small noise SU to the initial configuration and then iterate the leapfrog 
integration. In (b) ||e?£7|| is obtained by reversing the trajectory. 

exponentially according to eq. (24) with the Liapunov exponent as found above. 

To see whether this effect really happens, we performed a Monte Carlo run with our dy- 
namical Wilson fermion program as described above on 4 4 lattices. Every third trajectory, we 
reversed the time and integrated back, measuring 1 1 c?f7 1 1 using the initial configuration before 
starting the leapfrog integration and the final configuration at the end of the reversed trajec- 
tory. By fixing the step size e = 0.03, we obtained 1 1 c?f7 1 1 as a function of the trajectory length 
tN m d- Indeed, we find in Fig. 3b again a linear behavior of lg{\\ dU\\) with a Liapunov exponent 
that is compatible with the previous one. 

These figures were obtained from runs on workstations with 32 bit arithmetic. We repeated 
the runs with 64 bit arithmetic. We found the same asymtotic exponential behavior with the 
only difference that the amplitude A in eq. (24) gets smaller and the asymptotic behavior 
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sets in much later. We also measured the Liapunov exponent on the APE machine which has 
32 bit arithmetic but where we used double precision or Kahan summations [20, 21] to avoid 
rounding errors in scalar products. Again, we find a Liapunov exponent around v = 0.75. 
Notice that, for smaller trajectory lengths, the value of ||e?£7|| is less than what the asymptotic 
formula (24) predicts. For example, for a 4 4 lattice, when comparing the value of ||e?£7|| at 
trajectory length one (typical for HMC algorithm) and at trajectory length 0.1 (typical for 
the Kramers equation algorithm), we found a factor of 2.5 difference. This factor will increase 
further for larger lattices. From this point of view, the Kramers equation algorithm appears to 
be a safer algorithm than the HMC algorithm. 

To see whether the Liapunov exponent is really representing the chaotic behavior in the 
continuum time, we tried to extract it for different values of the integration step size e. We 
find that v is independent of the step size, provided the step size is small enough. Therefore, 
we expect the Liapunov exponent to represent the property of the continuum time integration. 

We have also investigated the dependence of the Liapunov exponent v on bare parameters. 
On the 4 4 lattice, we have performed the study of the quantity 1 1 c?f7 1 1 at various k values for 
f3 = 1.75. We found that the Liapunov exponent v has very weak dependence on k. Runs on 
a larger 6 3 12 lattice revealed again a value of v ~ 0.75, strengthening our conclusion that it is 
representing the continuum time chaotic behavior of the classical equations of motion. 

7 Conclusion 

In this paper, we compared the performance of the HMC and the Kramers equation algo- 
rithms in simulations of QCD with two flavors of dynamical Wilson fermions and gauge group 
SU(2). Adopting the same improvements for both algorithms, preconditioning and the Sexton- 
Weingarten [9] leapfrog integration scheme, we found that the Kramers equation algorithm 
performs as well as the HMC algorithm, as can be read off from table 4. Although the in- 
tegrated autocorrelation time is larger for the Kramers equation algorithm (see table 4), the 
average number of Conjugate Gradient iterations per trajectory is much less, resulting in a 
comparable performance for the Kramers equation algorithm. Therefore, we think that the 
Kramers equation algorithm should be used more often in QCD simulations so as to obtain 
more experience with it. In particular, the tuning of the parameter 7 in the Kramers equation 
algorithm seems to be important to obtain optimal performance for larger lattices. As the de- 
scription of the Kramers equation algorithm in section 3 shows, it is easy to change an existing 
HMC code to a Kramers equation one. 

We demonstrated that Hamilton's equations of motion used in the HMC algorithm, due to 
their nonlinear nature, behave like those of a chaotic dynamical system. Therefore, if rounding 
errors occur during a simulation, they accumulate and blow up exponentially oc exp{veN m( i} 
with eN m <i the trajectory length and v the Liapunov exponent which we determined to be 
v ~ 0.75. Therefore, the HMC algorithm may lack the reversibility property. Since in the 
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Kramers equation algorithm, a trajectory consists ol only one step, this effect is substantially 
reduced. 
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A Appendix: Stability of the Kramers equation algo- 
rithm 



For any algorithm it is sufficient to show that the transition probability P satisfies the irre- 
ducibility and the stationarity condition in order to guarantee the convergence to the equilibrium 
distribution [13]. Due to the refreshment of the momenta the algorithm is ergodic as is the 
HMC algorithm. Here we show that also the stability condition is fulfilled for the Kramers 
equation algorithm. In [6] it was shown that the Kramers equation algorithm even fulfills de- 
tailed balance and we therefore give the stability proof only as an additional part to make the 
paper self-contained. 

The stability criterion reads 

' dzir(z)P(z ^ y) = ir(y) , (25) 

where we denote with tt(z) the equilibrium distribution, z = (q,p) is a state in the algorithm 
in the field and momenta variables and P(z — > y) is the transition probability, which in case of 
the Kramers algorithm reads 

P(Qi, Pi 92, P2) (26) 
dpS(q 2 - I Q (q 1 ,p 1 ))S(p - Ip(q 1 ,p 1 )) 



'2tt(1 - x 2 
+ / dp6(q 2 - qi)S(p + px) 



2(l-x 2 ) 



(l -min{l,e g ( gl ' Pl )- g ( f q( gl ' Pl )' ff ( gl ' P1 )}) . . (27) 

V 1 I2tt(1-x 2 ) 



In (26) < x < 1 is arbitrary. Iq(p) correspond to the leapfrog integrators as described 
in the text. The only important property of them is that they are linear operators with exact 
time-reversal symmetry. With the transition probability as given in (26), it is easy to verify 
the stability criterion: 

j dq 1 dp 1 dpe- H ""'">P(q 1 , p 1 - q 2 ,p 2 ) (28) 



e 2(1-^) 
2w(l - x 2 



min{e-- ff(9l ' Pl) ,e-- ff(92 ' p) } 



+ J dq 1 dp 1 dp6(q 2 - qi)6(p + p t ] 
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C 2(1-^) 



2tt(1 - X 



2^ 



dpmm{e- H ^ q "- p ^ Ip ^'-^\e- H ^}- 6 ^"^ 



2tt(1 - 



+ Jdpe 



rr, , C 2(l-x2) 
-H(q 2 ,-p) 



2tt(1 - 



^minje-^ 2 '-^^-^^ 2 '-^'^^ 2 '-^} f ^ * ) , (29) 

2tt(1 - x 2 ) 



where in the first step we have used the fact that the integrators are reversible under a sign 
change of the momenta. Since the terms containing the minimum conditions cancel, and the 
Hamiltonian is quadratic in the momenta, we arrive at 



d -H(q 2 ,p) C 2(1 X ) = e -H(q 2 ,p 2 ) _ / 30 X 

2w(l - x 2 ) 



Hence the stability condition is shown. 
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